# DexNDM - Oracle Rotation Policy Training via RL & Deploy

Modified from Hora. 

[Website](https://projectwebsitex.github.io/neudyn-reorientation/) | [Video](https://projectwebsitex.github.io/neudyn-reorientation/static/videos_lowres/demo_video_8(1).mp4)




## Environment Setup

The project is developed using Python 3.8. Please install Isaac Gym Preview 4 from the [website](https://developer.nvidia.com/isaac-gym). We use PyTorch 2.4 with cuda 12.1. 

Please refer to `environment.yaml` for remaining environment setup. 



## Training

### Generate grasps

We use `gen_grasp_pool.py` to generate initial grasping poses. It generate per-object instance per-scale grasping pose for every object in the category. It runs in parallel. We generate grasping poses using the `palm_down` wrist orientation which are used for omniwrist orientation rotation training.  

**Customize Usage**: Set `obj_name` to the object category. Set `tot_obj_inst_idxes` to object instance indexes that you want to generate the grasping poses. Set `scale_list` to candidate scales. Set `cuda_lists` to available gpu indexes.


**Run**: Simply run `python gen_grasp_pool.py`. 



### Training

Training. 

```bash
bash scripts/train_s1_leap_udir.sh {GPU_ID} 0 {EXP_NAME} {AXIS} {AXIS_SIGN}  task.env.reward.addAuxPoseGuidance=True task.env.gravityVal=-9.81 task.env.grasp_cache_name={GRASP_CACHE_NAME}   task.env.numObservations=320 task.env.objectDownfacingInitZ=0.38 task.env.addFingertipStateVelObs=True task.env.addObjectStateObs=True task.env.addRotp=True task.env.numEnvs={N_ENVS} train.ppo.minibatch_size={N_ENVS}  task.env.randomization.randomizeMassUpper=0.05 task.env.object.type={OBJ_CATEGORY} task.env.object.seperateInstGraspPose=True  task.env.randomization.randomizeScaleList='[0.5]' task.env.object.specifiedObjectIdx={OBJ_INST_LIST}  task.env.addForceObs=True task.env.addContactForceWithBinaryContacts=True task.env.addObjGoalObservations=True  train.ppo.max_agent_steps={MAXTRAINSTEPS}  task.env.omniWristOrnt={OMNIDIRSETTING} task.env.randomizeRotDir={RNDROTDIRSETTING}  
```


Take cylinders as an example


```bash
bash scripts/train_s1_leap_udir.sh {GPU_ID} 0 debug_all_scale_down_z_init0d38_cylinder z 1  task.env.reward.addAuxPoseGuidance=True task.env.gravityVal=-9.81 task.env.grasp_cache_name='leap_down_init0d38_cylinder'  task.env.numObservations=320 task.env.objectDownfacingInitZ=0.38 task.env.addFingertipStateVelObs=True task.env.addRotp=True task.env.addObjectStateObs=True task.env.numEnvs=30000 train.ppo.minibatch_size=30000  task.env.randomization.randomizeMassUpper=0.05 task.env.object.type=cylinder_default task.env.object.seperateInstGraspPose=True  task.env.randomization.randomizeScaleList='[0.5]' task.env.object.specifiedObjectIdx='0AND1AND2AND3AND4AND5AND6AND7AND8'  task.env.addForceObs=True task.env.addContactForceWithBinaryContacts=True task.env.addObjGoalObservations=True  train.ppo.max_agent_steps=15000000000  task.env.omniWristOrnt=True task.env.randomizeRotDir=True  
```


### Testing


Test to visualize the training results. 

```bash
bash scripts/train_s1_leap_udir.sh {GPU_ID} 0 {EXP_NAME} {AXIS} {AXIS_SIGN}  task.env.reward.addAuxPoseGuidance=True task.env.gravityVal=-9.81 task.env.grasp_cache_name={GRASP_CACHE_NAME}   task.env.numObservations=320 task.env.objectDownfacingInitZ=0.38 task.env.addFingertipStateVelObs=True task.env.addObjectStateObs=True task.env.addRotp=True task.env.numEnvs={N_ENVS} train.ppo.minibatch_size={N_ENVS}  task.env.randomization.randomizeMassUpper=0.05 task.env.object.type={OBJ_CATEGORY} task.env.object.seperateInstGraspPose=True  task.env.randomization.randomizeScaleList='[0.5]' task.env.object.specifiedObjectIdx={OBJ_INST_LIST}  task.env.addForceObs=True task.env.addContactForceWithBinaryContacts=True task.env.addObjGoalObservations=True  train.ppo.max_agent_steps={MAXTRAINSTEPS}  task.env.omniWristOrnt={OMNIDIRSETTING} task.env.randomizeRotDir={RNDROTDIRSETTING}   headless=False task.env.numEnvs=100 test=True checkpoint={CKPT_FN}
```

Take cylinders as an example

```bash
bash scripts/train_s1_leap_udir.sh {GPU_ID} 0 debug_all_scale_down_z_init0d38_cylinder z 1  task.env.reward.addAuxPoseGuidance=True task.env.gravityVal=-9.81 task.env.grasp_cache_name='leap_down_init0d38_cylinder'  task.env.numObservations=320 task.env.objectDownfacingInitZ=0.38 task.env.addFingertipStateVelObs=True task.env.addObjectStateObs=True task.env.addRotp=True task.env.numEnvs=30000 train.ppo.minibatch_size=30000  task.env.randomization.randomizeMassUpper=0.05 task.env.object.type=cylinder_default task.env.object.seperateInstGraspPose=True  task.env.randomization.randomizeScaleList='[0.5]' task.env.object.specifiedObjectIdx='0AND1AND2AND3AND4AND5AND6AND7AND8'  task.env.addForceObs=True task.env.addContactForceWithBinaryContacts=True task.env.addObjGoalObservations=True  train.ppo.max_agent_steps=15000000000  task.env.omniWristOrnt=True task.env.randomizeRotDir=True      headless=False task.env.numEnvs=100 test=True checkpoint={CKPT_FN}
``` 



### Evaluation for Saving Trajectories

Rolling out the policy to save successful trajectories for BC training. 

```bash
bash scripts/train_s1_leap_udir.sh {GPU_ID} 0 {EXP_NAME} {AXIS} {AXIS_SIGN}  task.env.reward.addAuxPoseGuidance=True task.env.gravityVal=-9.81 task.env.grasp_cache_name={GRASP_CACHE_NAME}   task.env.numObservations=320 task.env.objectDownfacingInitZ=0.38 task.env.addFingertipStateVelObs=True task.env.addObjectStateObs=True task.env.addRotp=True task.env.numEnvs={N_ENVS} train.ppo.minibatch_size={N_ENVS}  task.env.randomization.randomizeMassUpper=0.05 task.env.object.type={OBJ_CATEGORY} task.env.object.seperateInstGraspPose=True  task.env.randomization.randomizeScaleList='[0.5]' task.env.object.specifiedObjectIdx={OBJ_INST_LIST}  task.env.addForceObs=True task.env.addContactForceWithBinaryContacts=True task.env.addObjGoalObservations=True  train.ppo.max_agent_steps={MAXTRAINSTEPS}  task.env.omniWristOrnt={OMNIDIRSETTING} task.env.randomizeRotDir={RNDROTDIRSETTING}    test=True task.on_evaluation=True  checkpoint={CKPT_FN}  task.env.additionalTag={ADDITIONAL_TAG} task.env.maxEvaluateEnvs={MAX_TEST_ENVS}  
```



### Deploy

**Base Policy**: Modify `invdyn_v2_log_path` to path to the base policy's checkpoint. 

**Residual Policy**: Modify `delta_action_model_full_hand_ckpt_fn` to the residual policy's checkpoint folder. 


```bash
bash scripts/deploy_leap.sh debug_allscales invdyn
```



## Ack

The repo is built upon [hora](https://github.com/HaozhiQi/hora). We thank authors for their great project. 


